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Computing smoothing distributions, the distributions of one or 
more states conditional on past, present, and future observations is 
a recurring problem when operating on general hidden Markov mod- 
els. The aim of this paper is to provide a foundation of particle-based 
approximation of such distributions and to analyze, in a common uni- 
fying framework, different schemes producing such approximations. 
In this setting, general convergence results, including exponential de- 
viation inequalities and central limit theorems, are established. In 
particular, time uniform bounds on the marginal smoothing error 
are obtained under appropriate mixing conditions on the transition 
kernel of the latent chain. In addition, we propose an algorithm ap- 
proximating the joint smoothing distribution at a cost that grows 
only linearly with the number of particles. 

1. Introduction. Statistical inference in general state space hidden Mar- 
kov models (HMM) involves computation of the posterior distribution of 

a set X s . s i = f [X S) . . . ,Xgi] of state variables conditional on a record Yq : t = 
Do-t of observations. This distribution will, in the following, be denoted by 
4>s:s'\T where the dependence of this measure on the observed values uq-t 
is implicit. The posterior distribution can be expressed in closed- form only 
in very specific cases, principally, when the state space model is linear and 
Gaussian or when the state space of the hidden Markov chain is a finite 
set. In the vast majority of cases, nonlinearity or non-Gaussianity render 
analytic solutions intractable [3, 26, 33, 36]. 
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This limitation has led to an increase of interest in alternative compu- 
tational strategies handling more general state and measurement equations 
without constraining a priori the behavior of the posterior distributions. 
Among these, sequential Monte Carlo (SMC) methods play a central role. 
SMC methods — in which the sequential importance sampling and sampling 
importance resampling methods proposed by [23] and [35], respectively, are 
combined — refer to a class of algorithms approximating a sequence of prob- 
ability distributions, defined on a sequence of probability spaces, by updating 
recursively a set of random particles with associated nonnegative importance 
weights. The SMC methodology has emerged as a key tool for approximating 
state posterior distribution flows in general state space models; see [9, 10, 12] 
for general introductions as well as theoretical results for SMC methods and 
[17, 31, 33] for applications of SMC within a variety of scientific fields. 

The recursive formulas generating the filter distributions 4>t (short-hand 
notation for 4>t-.t\t) an d the joint smoothing distributions <Pq : t\t are closely 
related; thus, executing the standard SMC scheme in the filtering mode pro- 
vides, as a by-product, approximations of the joint smoothing distributions. 
More specifically, the branches of the genealogical tree associated with the 
historical evolution of the filtering particles up to time step T form, when 
combined with the corresponding importance weights of these filtering par- 
ticles, a weighted sample approximating the joint smoothing distribution 
0O:T|r; see [9], Section 3.4, for details. From these paths, one may readily 
obtain a weighted sample targeting the fixed lag or fixed interval smoothing 
distribution by extracting the required subsequence of states while retain- 
ing the weights. This appealingly simple scheme can be used successfully 
for estimating the joint smoothing distribution for small values of T or any 
marginal smoothing distribution 4> s \t, with s <T, when s and T are close; 
however, when T is large and s <^.T, the associated particle approxima- 
tions are inaccurate since the genealogical tree degenerates gradually as the 
interacting particle system evolves [20, 21]. 

In this article, we thus give attention to more sophisticated approaches 
and consider instead the forward filtering backward smoothing (FFBSm) 
algorithm and the forward filtering backward simulation (FFBSi) sampler. 
These algorithms share some similarities with the Baum-Welch algorithm 
for finite state space models and the Kalman filter-based smoother and sim- 
ulation smoother for linear Gaussian state space models [8]. In the FFBSm 
algorithm, the particle weights obtained when approximating the filter dis- 
tributions in a forward filtering pass are modified in a backward pass; see 
[18, 24, 27]. The FFBSi algorithm simulates, conditionally independently 
given the particles and particle weights produced in a similar forward filter- 
ing pass, state trajectories being approximately distributed according to the 
joint smoothing distribution; see [21]. 
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The computational complexity of the FFBSm algorithm when used for 
estimating marginal fixed interval smoothing distributions or of the original 
formulation of the FFBSi sampler grows (in most situations) as the square 
of the number N of particles multiplied by the time horizon T. To alleviate 
this potentially very large computational cost, some methods using intri- 
cate data structures for storing the particles have been developed; see, for 
example, [28]. These algorithms have a complexity of order 0(Nlog(N)) 
and are thus amenable to practical applications; however, this reduction in 
complexity comes at the cost of introducing some level of approximation. 

In this paper, a modification of the original FFBSi algorithm is presented. 
The proposed scheme has a complexity that grows only linearly in N and 
does not involve any numerical approximation techniques. This algorithm 
may be seen as an alternative to a recent proposal by [20] which is based on 
the so-called two- filter algorithm [2]. 

The smoothing weights computed in the backward pass of the FFBSm 
algorithm at a given time instant s (or the law of the FFBSi algorithm) 
are statistically dependent on all forward filtering pass particles and weights 
computed before and after this time instant. This intricate dependence struc- 
ture makes the analysis of the resulting particle approximation challenging; 
up to our best knowledge, only a single consistency result is available in [21], 
but its proof is plagued by a (subtle) mistake that seems difficult to correct. 
Therefore, very little is known about the convergence of the schemes under 
consideration, and the second purpose of this paper is to fill this gap. 2 In 
this contribution, we focus first on finite time horizon approximations. Given 
a finite time horizon T, we derive exponential deviation inequalities stating 
that the probability of obtaining, when replacing <P s: t\t by the correspond- 
ing FFBSm or FFBSi estimator, a Monte Carlo error exceeding a given e > 
is bounded by a quantity of order 0(exp(— cNs 2 )) where c is positive con- 
stant depending on T as well as the target function under consideration. 
The obtained inequalities, which are presented in Theorem 5 (FFBSm) and 
Corollary 6 (FFBSi), hold for any given number N of particles and are 
obtained by combining a novel backward error decomposition with an adap- 
tation of the Hoeffding inequality to statistics expressed as ratios of random 
variables. We then consider the asymptotic (as the number N of particles 
tends to infinity) regime and establish a central limit theorem (CLT) with 
rate and with an explicit expression of the asymptotic variance; see 



2 Since the first version of this paper has been released, an article [11] has been pub- 
lished. This work, developed completely independently from ours, complement the results 
presented in this manuscript. In particular, this paper presents a functional central limit 
theorems as well as nonasymptotic variance bounds. Additionally, this work shows how the 
forward filtering backward smoothing estimates of additive functionals can be computed 
using a forward only recursion. 
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Theorem 8. The proof of our CLT relies on a technique, developed gradually 
in [6, 15, 30], which is based on a CLT for triangular arrays of dependent 
random variables; however, since we are required to take the complex de- 
pendence structure of the smoothing weights into account, our proof is sig- 
nificantly more involved than in the standard filtering framework considered 
in the mentioned works. 

The second part of the paper is devoted to time uniform results, and we 
here study the behavior of the particle-based marginal smoothing distribu- 
tion approximations as the time horizon T tends to infinity. In this setting, 
we first establish, under the assumption that the Markov transition kernel M 
of the latent signal is strongly mixing (Assumption 4), time uniform devia- 
tion bounds of the type described above which hold for any particle popula- 
tion size N and where the constant c is independent of T; see Theorem 11. 
This result may seem surprising, and the nonobvious reason for its validity 
stems from the fact that the underlying Markov chain forgets, when evolv- 
ing conditionally on the observations, its initial conditions in the forward as 
well as the backward directions. Finally, we prove (see Theorem 12), under 
the same uniform mixing assumption, that the asymptotic variance of the 
CLT for the particle-based marginal smoothing distribution approximations 
remains bounded as T tends to infinity. The uniform mixing assumption 
in Assumption 4 points typically to applications where the state space of 
the latent signal is compact; nevertheless, in the light of recent results on 
filtering stability [14, 29] one may expect the geometrical contraction of the 
backward kernel to hold for a significantly larger class of nonuniformly mix- 
ing models (see [14] for examples from, e.g., financial economics). But even 
though the geometrical mixing rate is supposed to be constant in this more 
general case, applying the mentioned results will yield a bound of contraction 
containing a multiplicative constant depending highly on the initial distri- 
butions as well as the observation record under consideration. Since there 
are currently no available results describing this dependence, applying such 
bounds to the instrumental decomposition used in the proof of Theorem 5 
seems technically involved. Recently, [39] managed to derive qualitative time 
average convergence results for standard (bootstrap-type) particle filters un- 
der a mild tightness assumption being satisfied also in the noncompact case 
when the hidden chain is geometrically ergodic. Even though this technique 
does not (on the contrary to our approach) supply a rate of convergence, 
it could possibly be adopted to our framework in order to establish time 
average convergence of the particle-based marginal smoothing distribution 
approximations in a noncompact setting. 

The paper is organized as follows. In Section 2, the FFBSm algorithm and 
the FFBSi sampler are introduced. An exponential deviation inequality for 
the fixed interval joint smoothing distribution is derived in Section 3.1, and 
a CLT is established in Section 3.2. In Section 4, time uniform exponential 
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bounds on the error of the FFBSm marginal smoothing distribution estima- 
tor are computed under the mentioned mixing condition on the kernel M. 
Finally, under the same mixing condition, an explicit bound on the asymp- 
totic variance of the marginal smoothing distribution estimator is derived 
in Section 4.2. 

Notation and definitions. For any sequence {a n } n >o and any pair of in- 
tegers < m < n, we denote a m:n = (a m , ■ ■ ■ , a n ). We assume in the follow- 
ing that all random variables are defined on a common probability space 
(J7,.F,P). The sets X and Y are supposed to be Polish spaces and we de- 
note by S(X) and B(Y) the associated Borel a-algebras. .Fb(X) denotes the 
set of all bounded B(K)/B(M) -measurable functions from X to R. For any 
measure ( on (X, £>(X)) and any £-integrable function /, we set £(/) = f 
J x f(x)C{dx). Two measures £ and £' are said to be proportional (written 
£ oc £') if they differ only by a normalization constant. 

A kernel V from (X, B(X)) to (Y, B(Y)) is a mapping from X x B(Y) into 
[0, 1] such that, for each A G B(Y), x \-t V(x,A) is a nonnegative, bounded, 
and measurable function on X, and, for each i£X,ih-> V(x, A) is a measure 
on B(Y). For / G T h (X) and x G X, denote by V{x,f) d = / V(x, dx')f{x'); 
we will sometimes also use the abridged notation Vf(x) instead of V(x, /). 
For a measure v on (X,£>(X)), we denote by vV the measure on (Y,B(Y)) 

defined by, for any A G B(Y), i/V(A) d = / x V(x, A)i/(dx). 

Consider now a possibly nonlinear state space model, where the state pro- 
cess {Xt}t>o is a Markov chain on the state space (X,23(X)). Even though t 
is not necessarily a temporal index, we will often refer to this index as 
"time." We denote by x an d M the initial distribution and transition ker- 
nel, respectively, of this process. The state process is assumed to be hid- 
den but partially observed through the observations {Y t }t>o which are Y- 
valued random variables being conditionally independent given the latent 
state sequence {X t }t>o; in addition, there exists a a-finite measure A on 
(Y, B(Y)) and a nonnegative transition density function g on X x Y such that 
F[Y t G A\X t ] = J A g(X t ,y)X(dy) for all A G B(Y). The mapping x i— > g(x,y) 
is referred to as the likelihood function of the state given an observed value 
y G Y. The kernel M as well as the transition density g are supposed to 
be known. In the setting of this paper, we assume that we have access 

def 

to a record of arbitrary but fixed observations yo:T = [vo, • • • iVt], and our 
main task is to estimate the posterior distribution of (different subsets of) 
the state vector Xq-t given these observations. For any t > 0, we denote 

by gt(x) = f g(x,yt) (where the dependence on y t is implicit) the likelihood 
function of the state Xt given the observation yt. 

For simplicity, we consider a fully dominated state space model for which 
there exists a cr-finite measure v on (X,/3(X)) such that, for all x G X, M(x, ■) 
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has a transition probability density m(x, •) with respect to v. For notational 
simplicity, v{dx) will sometimes be replaced by dx. 

For any initial distribution x on (X,23(X)) and any < s < s' < T, de- 
note by <P s - s i\t the posterior distribution of the state vector X s:s / given the 
observations yo-.T- For lucidity, the dependence of (fi s:s i\T on the initial distri- 
bution x is omitted. Assuming that f ■ ■■ J x{dxo) JIu=i 9u-i{x u -i)M(x u _i, 
dx u )gT(xT) > 0, this distribution may be expressed as, for all h G J r b(X s '~ s+1 ), 

/• • •/ x(^o)Ilu=i3u-i(a;«-i)M(a;„_i,dx,;)gT(a;T) 

In the expression above, the dependence on the observation sequence is im- 
plicit. If s = s 1 , we use 4> s \t (the marginal smoothing distribution at time s) 

as shorthand for <p s:s \T- If s = s' = T, we denote by cj) s = (fr s \ s the filtering 
distribution at time s. 



2. Algorithms. Conditionally on the observations yo-.T ■, the state sequence 
{X s } s >q is a time inhomogeneous Markov chain. This property remains true 
in the time-reversed direction. Denote by the so-called backward kernel 
given by, for any probability measure rj on (X, B(X)), 

m t? t u\ dcf Jv{dx')m(x\x)h(x') 

(1) *vM= h{dx , )m{x ,^ x) i he KM. 

The posterior distribution 4> s -t\t ma Y be expressed as, for any integers T > 0, 
sG {0,...,T- 1} and any h G T h (X T ~ s+1 ), 

(2) 4> s -.T\T{h) = I ■ ■ J ^t(^t)B0 t _ 1 (x T , dxr-i) ■ ■ ■ B<£ s (x s+ i, dx s )h(x s . T ). 

Therefore, the joint smoothing distribution may be computed recursively, 
backward in time, according to 

(3) (f>s:T\Ah) = J ■ ■ ■ J B (j>3 (x s+ i,dx s )(f) s+ i. T \ T (dx s+ i : T)Hx s ..T)- 

2.1. The forward filtering backward smoothing algorithm. As mentioned 
in the Introduction, the method proposed by [18, 24] for approximating the 
smoothing distribution is a two pass procedure. In the forward pass, particle 
approximations <p^ of the filter distributions <p s are computed recursively 
for all time steps from s = up to s = T. The filter distribution flow {4> s } s >o 
satisfies the forward recursion 

(4) < f >a (h) = ^- where 7 o (h) = X (goh), ls (h)= 7s-lM(g s h),s>l, 

7s(l) 
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for h € .Fb(X), with 1 being the unity function x H- 1 on X. In terms of SMC, 
each filter distribution <f) s is approximated by means of a set of particles 
{£s}i=i and associated importance weights {u;*}^ according to 

N ( u\ N 

(5) tf(h) * ^ where y?(h) * iV" 1 ]>>^)- 

Having produced, using methods described in Section 2.4 below, a se- 
quence of such weighted samples {(C?) w l)}i=i; 1 < i < T, an approximation 
of the smoothing distribution is constructed in a backward pass by replacing, 
in (2), the filtering distribution by its particle approximation. This yields 

(6) ^ T \ T {h) d = /• • J ^(dx^B^N ^ {x T , dxT-i) ■ ■ ■ B^n (x s+1 ,dx s )h(x s . T ) 

for any h £ J 7 \ ) (K T ~ S+1 ). The approximation above can be computed recur- 
sively in the backward direction according to 

(7) <Pf:T\T( h ) = J' " J B<f>N( x s+l,dx s )<f)^ +1 . T \ T (dx s+ l : T)h(x s: T). 

Now, by definition, 

B# (x, h) = f^ J? M ?',rt M O, h G 

i=l Z^£=l u s m \^s, x ) 
and inserting this expression into (6) gives 

N N { T I,?""- 1 ml?""- 1 £M \ , , iT 

(8) «&tt(*)=e-e n < T.v L m^'---'^ 



of 4>s-.T\T{h)i where /iG ^(X s ) and 



The estimator 4>^. T \ T is impractical since the cardinality of its support grows 
exponentially with the number T — s of time steps; nevertheless, it plays 
a key role in the theoretical developments that follow. A more practical ap- 
proximation of this quantity will be defined in the next section. When the 
dimension of the input space is moderate, the computational cost of evalu- 
ating the estimator can be reduced to 0(N log N) by using the fast multi- 
pole method as suggested in [28]; note, however, that this method involves 
approximations that introduce some bias. On the other hand, in certain 
specific scenarios, such as discrete Markov chains with sparse transition ma- 
trices over large state spaces, the complexity can even be reduced to O(NT) 
without any truncation; see [1]. 
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2.2. The forward filtering backward simulation algorithm. The estima- 
tor (8) may be understood alternatively by noting that the normalized 
smoothing weights define a probability distribution on the set {1, . . . , N} T ~ S 
of trajectories associated with an inhomogeneous Markov chain. Indeed, con- 
sider, for t E {0, . . . ,T — 1}, the Markov transition matrix {A^(i, j)}f^- =1 
given by 



For 1 < t < T, denote by 

(11) Jf d ^ f a{Y 0:T , (£, a#; < a < t, 1 < i < 

the cr-algebra generated by the observations from time to time T as well 
as the particles and importance weights produced in the forward pass up to 
time t. The transition probabilities defined in (10) induce an inhomogeneous 
Markov chain { J M }^ =0 evolving backward in time as follows. At time T, the 
random index Jj- is drawn from the set {1, . . . ,N} such that Jt takes the 
value i with a probability proportional to u T . At time t < T — 1 and given 
that the index Jt+i was drawn at time step t + 1, the index Jt is drawn from 
the set {1, . . . , iV} such that Jt takes the value j with probability A^( J t , j). 
The joint distribution of Jq-t is therefore given by, for jq-t E {1, . . . , N} T+1 , 

Jt 

(12) F[J 0lT =j Q ..T\^ ] = ff- A^(Jr,Jr-i) • ■ • AffOWo). 

Thus, and this is a key observation, the FFBS estimator (8) of the joint 
smoothing distribution may be written as the conditional expectation 

(13) ^ T{T (h) = E[h($,. . • ,Ct T )I^t ], h E Jb(X T+1 ). 

We may therefore construct an unbiased estimator of the FFBS estimator 
by drawing, conditionally independently given Fg, N paths of {Jo :T }f =1 of 
the inhomogeneous Markov chain introduced above and then forming the 
(practical) estimator 

N 

(14) 4>o :TlT (h) = N- 1 ]T h{i J J\. . . , £#), h E J- b (X T+1 ). 

This practical estimator was introduced in [21] (Algorithm 1, page 158). 
For ease of notation, we have here simulated iV replicates of the backward, 
index-valued Markov chain, but it would of course also be possible to sample 
a number of paths that is either larger or smaller than N . The estimator 
4>q-t\t ma y be seen as a Rao-Blackwellized version of <Pq. t ^ t - The variance 



SMC SMOOTHING FOR HMM 



9 



of the latter is increased, but the gain in computational complexity is sig- 
nificant. The associated algorithm is referred in the sequel to as the forward 
filtering backward simulation (FFBSi) algorithm. In Section 4, forgetting 
properties of the inhomogeneous backward chain will play a key role when 
establishing time uniform stability properties of the proposed smoothing 
algorithm. 

The computational complexity for sampling a single path of Jq-t is O(NT); 
therefore, the overall computational effort spent when estimating 4>q. t \ t us- 
ing the FFBSi sampler is 0(N 2 T). Following [28], this complexity can be 
reduced further to 0(iVlog(iV)T) by means of the fast multipole method; 
however, here again computational work is gained at the cost of introducing 
additional approximations. 

2.3. A fast version of the forward filtering backward simulation algorithm. 
We are now ready to describe one of the main contributions of this pa- 
per, namely a novel version of the FFBSi algorithm that can be proved to 
reach linear computational complexity under appropriate assumptions. At 
the end of the filtering phase of the FFBSi algorithm, all weighted parti- 
cle samples {(d,^)}^, < s < T, are available, and it remains to sam- 
ple efficiently index paths {J^. T }^ =1 under the distribution (12). When the 
transition kernel m is bounded from above in the sense that m(x, x') < o~ + 
for all (x,x') € X x X, the paths can be simulated recursively backward in 
time using the following accept-reject procedure. As in the standard FF- 
BSi algorithm, the recursion is initiated by sampling J^, . . . , multinomi- 
ally with probabilities proportional to {oo^iLi- For s G {0,...,T}, let 
the smallest cr-field containing and <t(J/ : 1 < I < N, t > s); then in or- 
der to draw Jf conditionally on G^L\, we draw, first, an index proposal if 
taking the value i £ {1, . . . , N} with a probability proportional to col an d, 
second, an independent uniform random variable f/f on [0,11. Then we set 

J l s = Ig if Ug < m(£ s s , ^g^ 1 )/o~+ ; otherwise, we reject the proposed index and 
make another trial. To create samples of size n E {1, . . . , N} from a multi- 
nomial distribution on a set of N elements at lines 1 and 6, Algorithm 1 
relies on an efficient procedure described in Appendix B.l that requires 
0(n(l + log(l + N/n))) elementary operations; see Proposition 14. Using 
this technique, the computational complexity of Algorithm 1 can be upper- 
bounded as follows. 

For the bootstrap particle filter as well as the fully adapted auxiliary 
particle filter (see Section 2.4 for precise descriptions of these SMC filters), it 
is possible to derive an asymptotic expression for the number of simulations 
required at line 8 of Algorithm 1 even if the kernel m is not bounded from 
below. The following result is obtained using theory derived in the coming 
section. 
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Algorithm 1 FFBSi-smoothing 



1: sample Jj,,...,J^ multinomially with probabilities proportional to 

for s from T — 1 down to do 
L<-(l,...,iV) 

while L is not empty do 
n <— size(L) 

sample I\ , . . . , l n multinomially with probabilities proportional 

to HjfL, 

sample U±, . . . , U n independently and uniformly over [0, 1] 
nL<-0 

for k from 1 to n do 

if U k <m(d {k) ,^ )M- then 

else 

nL^nLU{L(k)} 
end if 
end for 
L <— nL 
end while 
end for 



Proposition 1. Assume that the transition kernel is bounded from above, 
m(x,x') < <r+ for all (x,x') £XxS. At each iteration s G {0, . . . ,T — 1}, 
lei 6e i/ie number of simulations required in the accept-reject procedure 
of Algorithm 1. 

• For the bootstrap auxiliary filter, /N converges in probability to 

, sdef , ( s I---Jdx s+1 l\l=s + 2l m ( x u-i,dx u )g u (x u ) 

a{s) = <7+0 a | a _iO/ a )- — T 

J •••J s | s -i(oa;s)3s(a; s )ll u=s+ im(cc u _i,dx u )5 u (x u ) 

as iV goes to infinity. 

• in £/ie /uZZt/ adapted case, /N converges in probability to 

a , xdef J---Jdx s+1 Y\l =s+2 Jm(x u - 1 ,dx u )g u (x u ) 

p{s) = cr + — j, 

J ■ • • J (j)s{dx s )g s (x s ) \{ u=s+l m(x u - 1 ,dx u )g u (x u ) 

as N goes to infinity. 

A sufficient condition for ensuring finiteness of a(s) and /3(s) is that 
f 9u(x u ) dx u < oo for all u>0. 
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If the transition kernel satisfies stronger mixing conditions, it is possible 
to derive an upper-bound on the computational complexity of the FFBSi 
for any auxiliary particle filter, that is, the total number of computations 
(and not only the total number of simulations). Note that this result is not 
limited to the bootstrap and the fully adapted cases. 

Proposition 2. Assume that the transition kernel is bounded from be- 
low and above, that is, cr_ < m(x, x') < a + for all (x, x') £lx X. Let C(N, T) 
denote the number of elementary operations required in Algorithm 1. Then, 
there exists a constant K such that such that ~E[C(N, T)] < KNTa+jcr^. 

The proofs of Propositions 1 and 2 involve theory developed in the coming 
section and are postponed to Section 5. 

Before concluding this section on reduced complexity, let us mention that 
efficient smoothing strategies have been considered by [19] using quasi-Monte 
Carlo methods. The smoother (restricted to be one-dimensional) presented 
in this work has a complexity that grows quadraticly in the number of par- 
ticles N; nevertheless, since the variance of the same decays as 0(N~ 2 ) (or 
faster) thanks to the use of quasi-random numbers, the method is equiva- 
lent to methods with complexity growing linearly in N [since the standard 
Monte Carlo variance is OiN- 1 )}. This solution is of course attractive; we 
are however not aware of extensions of this approach to multiple dimensions. 

2.4. Auxiliary particle filters. It remains to describe in detail how to pro- 
duce sequentially the weighted samples {(Cs) w s)}i=i) < s < T, which can 
be done in several different ways (see [3, 17, 31] and the references therein). 
Still, most algorithms may be formulated within the unifying framework of 
the auxiliary particle filter described in the following. Let be i.i.d. 

The 



random variables such that £g ~ po and set ojq = d;Wdpo(£o)5o(£o) 
weighted sample {(£oi w o)}i=i then targets the initial niter in the 
that c/)q (h) estimates 4>o{h) for h £ ^(X). In order to describe the sequential 
structure of the auxiliary particle filter, we proceed inductively and assume 
that we have at hand a weighted sample w*_ 1 )}^ 1 targeting <fi s -i i n 

the same sense. Next, we aim at simulating new particles from the target 
d)^' 1 defined as 



le sense 



(15) W= V j; / , h€T b (X), 



in order to produce an updated particle sample approximating the subse- 
quent filter (p s . Following [32], this may be done by considering the auxiliary 
target distribution 



(16) tfT&h)^ „ N -\ \ s l7*r \„ heT h (X) 
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on the product space {1, . . . , N} x X equipped with the product cr-algebra 
V({1,...,N}) <g) B(X). By construction, is the marginal distribution 
of (fi s ' a with respect to the particle index. Therefore, we may approximate 
the target distribution (j)^^ on (X, B(K)) by simulating from the auxiliary 
distribution and then discarding the indices. More specifically, we first sim- 
ulate pairs {{Il-,C s )}iLi °f indices and particles from the instrumental dis- 
tribution 

(17) ns\ a (i,h)acui_M?.-i)Ps(£-i,h), heF h (X), 

on the product space {1, . . . , N} x X, where {$s(£s-i)}£Li are so-called ad- 
justment multiplier weights and P s is a Markovian proposal transition kernel. 
In the sequel, we assume for simplicity that P s (x, •) has, for any x E X, a den- 
sity p s (x, •) with respect to the reference measure v. For each draw 
i = 1, . . . , N, we compute the importance weight 

such that <jj\ oc d<fi^ ,a /dir s \ s (P s , and associate it to the corresponding 
particle position Finally, the indices are discarded whereupon 

{(Cs^l^iLi is taken as an approximation of <f> s . The simplest choice, yield- 
ing to the so-called bootstrap particle filter algorithm proposed by [22], 
consists of setting, for all x € X, r & s {x) = 1 and p s (x, ■) = m(x,-). A more 
appealing — but often computationally costly — choice consists of using the 
adjustment weights $ s (x) = $*(x) ^ f m(x, x')g s {x') dx' , x G X, and the pro- 
posal transition density 

*, def m{x,x')g s {x') , 
p(x,x) = — — , i,i eXxI. 

In this case, the auxiliary particle filter is referred to as fully adapted. Other 
choices are discussed in [16] and [7]. 

3. Convergence of the FFBS and FFB Si algorithms. In this section, the 
convergence of the FFBS and FFBSi algorithms are studied. For these two 
algorithms, nonasymptotic Hoeffding-type deviation inequalities and CLTs 
are obtained. We also introduce a decomposition, serving as a basis for most 
results obtained in this paper, of the error <Pq-t\t ~ ^0:T|r an d some technical 
conditions under which the results are derived. 

For any function f :X. d — > M., we define by |/|oo = sup x&X d\f(x)\ and 

osc(/) = sup^ tX i-) e x d xX d \ f( x ) ~ f( x ')\ the supremum and oscillator norms, 

respectively. Denote N = NU {oo} and consider the following assumptions 
where T is the time horizon which can be either a finite integer or infinity. 
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Assumption 1. For all < t < T, g t (-) > and sup 0<i<r \gt\oo < oo. 

Define for t > the importance weight functions 

no , , ,dcf d x , w . ( ^def m{x,x')g t {x') 

(19) ^o(x) = ^(x) 50 (x) and ^,x) = ^^, *>1- 

Assumption 2. sup 1<t<T |??t|oo < oo and sup 0<t<T |wt|oo < oo- 

The latter assumption is rather mild; it holds in particular under As- 
sumption 1 for the bootstrap filter (pt = m and = 1) and is automatically 
fulfilled in the fully adapted case (wj = 1). 

The coming proofs are based on a decomposition of the joint smoothing 
distribution that we introduce below. For < t < T and h 6 J-b(X r+1 ), define 
the kernel L t , T ■ x B(X)® T+1 -> [0, 1] by 



(20) L t>T (x 0:t ,h) = 



/'"/( IT M ( x u-i,dx u )g u (x u ))h(x 0:T ) 

\u=t+l / 



and set LT,T{xo : T,h) d = /i(xo:t)- By construction, for every t € {0, ... ,T}, 
the joint smoothing distribution may be expressed as 

(21) <^0:r|T(/l) = FT T~rvr- 

This expression extends the classical forward-backward decomposition to 
the joint smoothing distribution; here L^r(-,/i) plays the role of the so- 
called backward variable. This suggests to decompose the error <p^ T ^ T (h) — 
4>0:T\T(h) as the following telescoping sum: 

a,n /.n , /? \ ^p r [^o,T(-,fe)] <fo[L ,r(-,h)] 
1 ^ v [L ,t(-,1)] 0o[L Oi t(-,1)J 

( 22 ) 



+E 



t=l ^ Y 0:t\ 

The first term on RHS of the decomposition above can be easily dealt with 
since (pQ is a weighted empirical distribution associated to i.i.d. random 
variables. 

To cope with the terms in the sum of the RHS in (22), we introduce some 
kernels (depending on the past particles) that stress the dependence with 

t\ 



respect to the current particules. More precisely, i^hJL^tO, h)\ is expressed 



as 

(23) 4>o;t\t[ L t,T{-, h)] = 4>f[Cf jT {-, h)] = ^ [C l^ k)] 



-yf(i) 



14 DOUC, GARIVIER, MOULINES AND OLSSON 

where the random kernels C^ T : X x B(H)®( T+1 ^ — > [0, 1] are defined by: for 
all < t < T, and x t £ X, 

(24) C^ T (x t , h) d = J ' ■■■ J B^iv i (x t , • • • (x x , dxo)L i>T (xo :i , h), 
and 

(25) ££ t (x,/0 = £o,tM- 

We stress that the kernels Cf T depend on the particles and weights (£* , ui\)f =l , 
< s < t — 1, through the particle approximations <^^i, • • ■ , ^ °f the fil- 
ter distributions. When proving the CLT for the FFBS algorithm, it will 
be crucial to establish that for any h £ J r \ :i (K T+1 ), Cf T (-,h) converges (see 
Lemma 7 below), as the number N of particles tends to infinity, to a deter- 
ministic function Ct^T(',h) given by 

(26) C t< T(x t ,h) d = / ••• / B$ t _ 1 (x u dxt-i)---Bfo(xi,dx G )L ttT (xQ.tih). 



In the sequel, the case h = l will be of particular importance; in that case, 
Lt,T(%o-.t, 1) does not depend on xo-.t-i, yielding 

(27) C^ T {x u 1) = C t , T (x t , 1) = L ttT (x Q:t , 1) 

for all xo : t £ X* . Using these functions, the difference appearing in the sum 
in (22) may then be rewritten as 

<j>"[L t>T (-,h)} < t _ 1 , t _ 1 [L t -i,T(->>0] 



C^tG,!)] ^ t _ 1|t _ 1 [L t _ 1)T (.,l)] 



where the kernel G^ T : X x B(X) T+1 -> [0, 1] is defined by, for x £ X, 

(29) G- t( x, *) 1* £f T (x, h) - ^Mk^^ I)- 

Similarly to Cf T (-,h), the functions G^ T (-,h) depend on the past particles; 
it will however be shown (see Lemma 7 below) that G^ T (-,h) converges to 
the deterministic function given by, for x £ X, 

(30) Gt,r(x,h) = Ct,T{x,h- (f> :T\Ah)). 

The key property of this decomposition is stated in the following lemma. 
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Lemma 3. Assume that Assumptions 1-2 hold for some T < oo. Then, 
for any 0<t<T, the variables {ujfG^ T (S,f,h)}^ =1 are, conditionally on the 

o~ -field J-j^_ 1 , i.i.d. with zero mean. Moreover, there exists a constant C (that 
may depend on t and T) such that, for all N > 1, £ £ {1, . . . ,N}, and h £ 
-Fb(X T+1 ), 

Proof. By construction, all pairs of particles and weights of the weighted 
sample > ^t^eLi are i-i-d. conditionally on the cr-field J r ^_ 1 - This implies 
immediately that the variables {wf T (£|, h)}^_ 1 are also i.i.d. conditionally 
on the same cr-field J 7 ^- We now show that W\oj\G^ t {(1 , h)\ J 7 ^] = 0. Using 
the definition of G^ T and the fact that (p^^C^-^ T (-, h)] and 4>^_ 1 [C^_ l T (-, 1)] 
are .F/^-measurable, we have 

ElutGMZhh)^] 



f [£?_ ltT (;h)] . 

't-il/^i/rO) -01 



which is equal to zero provided that the relation 

(3D n^Athh)^} = ^- l[ jr)*\' h)] 

holds for any h S J-"b(X). We now turn to the proof of (31). Note that for 
any / G T h (X), 

EK/(^)|A_iJ = — jj — - — - 

E^=i w t-i^(C!-i) 

(32) 

fc[M(- >gt /)] 

It turns out that (31) is a consequence of (32) with /(•) = Cf T (-,h), but 

since Cf_ lT (-,h) is in general different from M(-,g t Cf T (-,h)), we have to 
prove directly that 

(33) ^[AVf-^l^f-iM-,^^))]. 
Write 

^i[M(-,^ T (.,fc))] 

(34) ={1^^24-1 I •■ J mi^x^gtix^f^B^N^x^dxu^)) 

£=1 \u=l / 

x L t)T {xQ. t ,h) dx t . 
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To simplify the expression in the RHS, we will use the two following equal- 
ities: 



N 



N 



(35) \^u3l_ 1 m{il_ 1 ,xt)\B^N^ u dxt-x)=^ 



(36) 



M(xt-i,dx t )gt(xt)L tjT (xo:t,h) = L t _ ljT (x :t-i,ti). 



The first relation is derived directly from the definition (1) of the backward 
kernel, the second is a recursive expression of Lfr which is straightforward 
from the definition (20). Now, (35) and (36) allow for writing 



dxt 



u=l 

M(xt-udx t )gt{xt)5^^(dxt-i) 
t-i 

x \\ B (j) N_ i (x u ,dx u - 1 )Lt ! T{xo:t,h) 

u=l 

N ,. . t-1 

y^^|-i / ••• / 5^_ i (<k>t-i)Y[ B <^_ 1 i x u,<^u-i)Lt-i,T(x Q .t-i,h) 



u=l 



N 



= E^f-l(d-l^)- 

By plugging this expression into (34), we obtain (33) from which (31) follows 
via (32). Finally, E [u>\ G^ T (^ ,h)\ J 7 ^ 1 ] =0. It remains to check that the 
random variable ujG^ T (^ : h) is bounded. But this is immediate since 



^G^l,h)\ = \u t \ 



(37) 
□ 



^ [£"(-,!)] 



A,r(-,1) 



< 2|w t | 00 |£f T (-,l)| 00 osc(/i) < 2| |^t,T(-,l)|ooOSc(/l). 



3.1. Exponential deviation inequality. We first establish a nonasymptotic 
deviation inequality. Considering (28), we are led to prove a Hoeffding in- 
equality for ratios. For this purpose, we use the following elementary lemma 
which will play a key role in the sequel. The proof is postponed to Ap- 
pendix A. 
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Lemma 4. Assume that a;\i, bjy and b are random variables defined on 
the same probability space such that there exist positive constants f3, B, C 
and M satisfying: 

(I) \a N /b N \ < M, F-a.s. and b>f3, F-a.s., 
(II) for alle>0 and all N>1, F[\b N - b\ > e] < Be~ CNe2 , 
(III) for alle>0 and all N>1, F\\a N \ > e] < Be- CNi ~ £ l M ^ . 



Then 



a N 



>N 



> e I < B exp 



2M J 



Theorem 5. Assume that Assumptions 1-2 hold for some T < oo. Then, 
there exist constants < B and C < oo (depending onT) such that for all N , 
e > 0, and all measurable functions h G J r \ y (K T+1 ) , 



(38) 

In addition, 
(39) 



&rtr(h) - ^.T\T(h)\ >e]< Be- CN * 2 l^ h \ 



N 



> N^oo 



Remark 1. As a by-product, Theorem 5 provides an exponential in- 
equality for the particle approximation of the filter. For any h G J-j^X), 
define the function ho : x '■ — > R by Hq-t{xq-.t) — h{xx). By construction, 
0O:T|t(^O:t) = fr(/») and <j>Q. T i T (ho:T) =( I>t^ 1 )- With this notation, equa- 
tion (38) may be rewritten as 

W[\<$(h) - <j>r(h)\ >e}< Be~ CNe2 /° sc2 ( h \ 

An inequality of this form was first obtained by [12] (see also [9], 
Chapter 7). 



Proof. We prove (38) by induction on T using the decomposition (22). 
Assume that (38) holds at time T — 1, for <p^. T _ 1 ^ T _ 1 (h). Let h G T\ ) (X T+1 ) 
and assume without loss of generality that 4'0:T\t(^ 1 ) = 0- Then (21) implies 
that 0o[£o,t(') h)] = and the first term of the decomposition (22) thus 
becomes 
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where are i-i-d. random variables with distribution po- We obtain an 

exponential inequality for (40) by applying Lemma 4 with 



N 



a N = N- 1 ^^(&)g (&)L , T (ti,h), 



lb = p= x \go(-)Lo,T(-> 1 )}- 

Condition (I) is trivially satisfied and conditions (II) and (III) follow from 
the Hoeffding inequality for i.i.d. variables. 

By (22) and (28), it is now enough to establish an exponential inequality 
for 



(41) 



t [L ttT (; 1)] C-i| t -i [Lt-tH; 1)] N-t Eti uU*r& 1) ' 
where < t < T. For that purpose, we use again Lemma 4 with 

N 



(42) 



=i 



N 



b N = N- 1 Y," e t£t,T$,l), 



b = fi 



i t _ 1 [£ t _ liT (.,l)] 



&-i(0t) 

By considering the LHS of (41), /^tv| < 2|/i|oo, verifying condition (I) 
in Lemma 4. By Lemma 3, Hoeffding's inequality implies that there exist 
constants B and C such that for all N, e > 0, and all measurable function 

N 



>£ 



E 



N 



>e 



-rN 
->~ t-1 



< Be -CNe 2 /osc 2 (h)^ 



verifying condition (III) in Lemma 4. It remains to verify condition (II). 
Since the pairs of particles and weights of the weighted sample {(^ , oj()}f =1 
are i.i.d. conditionally on J~£Li, Hoeffding's inequality implies that 



(43) 



b N -E 



N 



N 



1=1 



>e 



< Be 



-CNe 2 
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Moreover, by (32), (27), and the definition (20), we have 
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E 



(44) 



N 



e i [A-l,T(-,l)] 



!»t- 1 [A- 1 ,T(-,l)] = d(g) 



def 



with H(-) u = 



-i[A-i,T(-,l)]iM-)M-i(0t)- To obtain an ex- 



ponential deviation inequality for (44), we apply again Lemma 4 with 



TV 



i(H), 



b' =P> =^_x (0 t )- 



By using the inequality 
£ t -i,T(xt-i, 1) 



m(x t „i,xt)fff(a; f ) 

< $t(xt-l)\ut\oo\Ct,T(-, l)|oo, 



Pt{x t -i,x t )£.t,T(xt, l)dx t 



we obtain the bound l<^^Li 



i(#t)| < 2|wt| 00 |£ tj T(-,l)|oo which veri- 
fies condition (I). Now, since t — 1 < T — 1 and 4>t-i(H) = 0, the induction 
assumption implies that conditions (II) and (III) are satisfied for \b' N — b'\ 
and \a' N \. Hence, Lemma 4 shows that 









N 










(45) P 




E 




•r t-x 


-b 


> £ 


< Be' CN " 2 



















Finally, (43) and (45) ensure that condition (II) in Lemma 4 is satisfied and 
an exponential deviation inequality for (41) follows. The proof of (38) is 
complete. The last statement (39) of the theorem is a consequence of (43) 
and (45). □ 

The exponential inequality of Theorem 5 may be more or less immediately 
extended to the FFBSi estimator. 

Corollary 6. Under the assumptions of Theorem 5 there exist con- 
stants < B and C < oo (depending on T) such that for all N , £ > 0, and 
all measurable functions h, 



(46) 



"(h) - ^ T]T (h)\ >£]< Be- CNs2 l°^ h \ 



where (t>Q. T \ T (h) is defined in (14)- 
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PROOF. Using (13) and the definition of 4>f. T ^ T (h), we may write 

4>0:T\T( h ) - ^O-.TlAh) 
N 

= N-^mt ,. . . ,#) - nh(t J o°, ■ ■ ■ ,£t t )i^t ]], 

1=1 

which implies (46) by the Hoeffding inequality and (38). □ 

3.2. Asymptotic normality. We now extend the theoretical analysis of 
the forward-filtering backward-smoothing estimator (6) to a CLT. Consider 
the following mild assumption on the proposal distribution. 

Assumption 3. |m|oo < oo and sup 0<t <r \pt\oo < °°- 

CLTs for interacting particle models have been established in [9, 12, 15]; 
the application to these results to auxiliary particle filters is presented in [25] 
and [16], Theorem 3.2. Here, we base our proof on techniques developed 
in [15] (extending [6] and [30]). As noted in the previous section, it turns 
out crucial that Gf T (- : h) converges to a deterministic function as N — >co. 
This convergence is stated in the following lemma. 

Lemma 7. Assume Assumptions 1-3. Then, for any h £ J-~b(X) and 

xGX, 

lim C tT (x,h) = jCtT(x,h), P-a.s., 

TV— >oo ' 

lim G? T (x,h) = G tT (x,h), P-a.s., 

7V->oo ' 

where C^ T , C t>T , Gf T and G t>T are defined in (24), (26), (29) and (30). 
Moreover, there exists a constant C ( that may depend on t and T ) such that 
for allN>l, £e{l,...,N}, andhe^X), 

\c4G tt T(&h)\ < \u t \oo\Gt,T($M < Cosc(h), P-a.s. 

Proof of Lemma 7. Let h e .Fb(X) and x t G X. By plugging (1) with 
■q = <p^_i into the definition (24) of Cf T (xt,h), we obtain immediately 

Ct >T (x t ,h) 

_ /•••/ (pf_ 1 (dx t ^i)]J t ^J B^N(x u+ i,dx u )rn(x t ^ 1 ,xt)Lt ! T(x 0: t,h) 



J (j)f_ x (dx t -i)rn{xt-i,x t ) 



dcf 

N - m Wlth H \ x o--t) = m{xt-i,x t )L t:T (xo:t,h). 



CiK. 1 *)] 
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The convergence of C^ T (-,h) follows from Theorem 5. The proof of the 
convergence of G? T (-,h) follows the same lines. Finally, the final statement 
of the lemma is derived from Lemma 3 and the almost sure convergence 
of G% T (-,h) to G t>T (-,h). □ 

Now, we may state the CLT with an asymptotic variance given by a finite 
sum of terms involving the limiting kernel Gt,T- 

Theorem 8. Assume Assumptions 1-3. Then, for any h G J 7 ^(K T+1 ), 

(47) VN(<p^ T]T (h) - 4> mT (h)) AjV(o,r ^| T W) 

with 

(Aa , r rii1 drf PoH(-)Gl T (;h)} ^ ^-iKrC-./t)]^-!^) 

(48 J l 0:T | T ft — — — — + > ~2 — — — — , 



(49) ^ r (.,/i) = ^(-) y P t (.,da;K(-,x)G^ r (a;,/i). 

Proof. Without loss of generality, we assume that 0o:T|t(^) = 0- We 
show that VN (j>Q. T i T (h) may be expressed as 

T y N (h) 

(50) v / iv^ T|T (M = E4fr Z ' 

t=o '^t.r 

where the sequence of random vectors [V^(/i), . . . , V^ T (/i)] is asymptoti- 
cally normal and [ Wq T , . . . , Wj? T ] converge in probability to a deterministic 
vector. The proof of (47) then follows from Slutsky's lemma. Actually, the de- 
composition (50) follows immediately from the backward decomposition (22) 
by setting, for t & {1, . . . , T}, 

N 



=i 



N , 

i=i ap0 

N 

i=i 
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The convergence 

W$r A^ooxboO^rC-.l)], 

w/W 0f-i[£f-i, T (-,l)] 
<Pt-i\yt) 

of [Wq^, . . . , T ] to a deterministic vector is established immediately us- 
ing (39) and noting that the initial particles (Q))iLi are i-i.d. We devote the 
rest of the proof to showing that the sequence of random vectors [V c \ip(h), . . . , 
Vjf T {h)\ is asymptotically normal. Proceeding recursively in time, we prove 
by induction over t e {0, . . . , T} (starting with t = 0) that [V£ T (h), V t N T {h)} 
is asymptotically normal. More precisely, using the Cramer-Wold device, it 
is enough to show that for all scalars (ao, • • • , at) £ 



(51) Y,a r V r N T (h) A^oo AA 0,^ 2 y r , T [h] , 

r=0 V r=0 / 

where, for r > 1, 

2 rM dcf r 2^2 / >M 2 r.n def ^t-lKT(-,^)] 



^tN = Po[^o G o,T(-,h)], a tiT [ 



The case t = is elementary since the initial particles {£q}£Li are i.i.d. 
Assume now that (51) holds for some t — 1 < T; for all scalars (a±,..., 
a t _ 1 )GR*- 1 , 

t-1 / t-i \ 

(52) £ a r V r N T (h) AA 0, ^ a^ r 2 T [/i] . 

r=s \ r=s / 

The sequence of random variable V^ T {h) may be expressed as an additive 
function of a triangular array of random variables, 

N 

V t N T (h) = £ U N/ , U N/ d ^ f <4G»?(& h)/VN, 
l=\ 

where G^ T (x,h) is defined in (29). Lemma 3 implies that E[V^(/i)| J~tL]\ = 
0, yielding 



E 



J2a r V r N T (h) j£ x 



T=0 



t-1 / t-x \ 

Y J »rV r N T {h) A^oo 0,^a r V r 2 T [/i] , 

r=0 V r=l / 



where the last limit follows by the induction assumption hypothesis (52). 
By [15], Theorem A. 3, page 2360, as the random variables {Un^}^ =1 are 
centered and conditionally independent given (51) holds provided that 
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the asymptotic smallness condition 

N 

(53) E^V^I^}!-^ -^-+00 
holds for any e > and that the conditional variance converges: 

N 

(54) Y.^h^t-l] -^N^oo al T [h\. 
i=i 

Lemma 3 implies that |?7jv^| — C osc(h) / y/N , verifying immediately the 
asymptotic smallness condition (53). To conclude the proof, we thus only 
need to establish the convergence (54) of the asymptotic variance. Via Lem- 
ma 3 and straightforward computations, we conclude that 

N 

j2nui e \^ N i}=n(^GMe t ,h)) 2 \^ 1 ] 
i=i 

J e=i Ej=i^t-iMC J t -i) 

(55) 

= ( N ^TTJTj J (^X>*-i<r(g*-i.ft)) 

\Y,j=i<4-M&-\) J v h - x i=x ) 

where fit is defined in (9) and 

< T M) = M-) J Pt{;dx)u? t {;x)[G% T {xM 2 - 

The denominator in on RHS of (55) converges evidently in probability 
to 4>t-i('&t) by Theorem 5. The numerator is more complex since vf T de- 
pends on Gf T whose definition involves all the approximations (p^Li, ■ • ■ , 4>q 
of the past filters. To obtain its convergence, note that, by Theorem 5, 
<j>f-\{vt,T{-^)) — > <fit-i(vt,T( m ,h)) as N tends to infinity; hence, it only re- 
mains to prove that 

(56) <l>"i[vgr(-,h)-vt,T(',h)] A^oc 0. 

For that purpose, introduce the following notation: for all i£X, 

^(x) = #J^(Op t (^K 2 (^x)|(Gf T (x,/ i )) 2 -G t 2 T (x,/ l )|], 
B N (x) = <f>^x[M-)PtM]. 
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Applying Fubini's theorem, 



(57) 



lim E 

jV->-oo 



An(x) dx 



lim / E[A N (x)]dx = 0, 

iV-s-oo J 



where the last equality is due to the generalized Lebesgue convergence the- 
orem [34], Proposition 18, page 270, with Jn(x) = E[An(x)] and <?Ar(x) = 
2C osc(1i)E[Bn(x)] provided that the following conditions hold: 

(i) for any x G X, E[A N (x)] < 2C 2 osc 2 (h)E[B N (x)], 

(ii) for any i£X, limjv->oo EL4at(x)] = 0, P-a.s., 
(hi) liniAT^oo jE[B N (x)]dx = J lim N ^ OD E[B N (x)]dx. 

Proof of (i). The bound follows directly from Lemmas 7 and 3. 
Proof of (ii). Using again Lemmas 7 and 3, for any i£X, 

A N (x)<2C 2 \$ t \ 00 \p t \ osc (h), 

lim sup A n (x) < I'&tPt^t l°° hmsup | (G^ T (x, h)) 2 — G 2 T (x, h)\ = 0, P-a.s. 

N— S-oo N-^-oo 

These two inequalities combined with An(x) > allow for applying the 
Lebesgue dominated convergence theorem, verifying condition (ii). 
Proof of (hi). We have 



lim [E\B N (x)]dx= lim E 

N^oo J N^oo 



&-iK(0 I Pt(;x)dx 



(b) , - , (c) 

= (pt-lipt) = 



<pt-i(#t(-)pt(-,x))dx 



' lim E[^ 1 (^(-)pt(-,x))]dx 
lim E[i?Ar(x)] dx, 

N-*oo 

where (a) and (c) are consequences of Fubini's theorem and (b) and (d) 
follows from the L 1 -convergence of (h) to 4>t(h) (see Theorem 5) with 
h(-) = M-) and h(-)=M-)Pt(-,x). 

Thus, (57) holds, yielding that j A^{x)dx — > as N tends to infinity. 
This in turn implies (56) via the inequality 

\<Pt-iK T (;h)-v t>T (;h)}\< j A N (x)dx. 

This establishes (51) and therefore completes the proof. □ 



The weak convergence of y/~N(<j)Q. T , T (h) — </>o:T|t(^)) f° r the FFBS algo- 



rithm implies more or less immediately the one of y/N$* TlT (h)- 00) 
for the FFBSi algorithm. 
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Corollary 9. Under the assumptions of Theorem 8, 
y/N$"(h)-<f> ., TlT (h)) 



(58) 



A JV(0, 4>l. T \ T [h - 4>o-.T\T{h)] + r 0:T | T [/l - 0O:T|t(M])- 



Proof. Using (13) and the definition of (f>Q. T ^ T (h), we may write 

IN 



N{<t>g T \ T {h)-<t> :T\ T {h)) 

N 



+ V / iV« T | T (M-0O : T|TW)- 

Note that since { Jq.^^L^ are i.i.d. conditional on J-^ , (58) follows from (47) 
and direct application of [15], Theorem A. 3, page 2360, by noting that 

N ( 

n- 1 £e[{a(£v . . , 4 T ) - nnti , ■ ■ ■ , e^)i^]} 2 i^] 

= (^ofrlT^ - ^ofrlTl^)]) 2 (^0:T|T[^ - 4>Q:T\T{h)]) 2 ■ □ 

4. Time uniform bounds. Most often, it is not required to compute the 
joint smoothing distribution but rather the marginal smoothing distribu- 
tions <f> s \T- Considering (8) for a function h that depends on the compo- 
nent x s only, we obtain particle approximations of the marginal smoothing 
distributions by associating the set {Cs}f = i of particles with weights ob- 
tained by marginalizing the joint smoothing weights according to 



i s+1 =\ t T =i u=s+ i Lf=i w u -i m lVi'^ I 1 



It is easily seen that these marginal weights may be recursively updated 
backward in time as 



7[ EiLi w l m (d^+i) 



In this section, we study the long-term behavior of the marginal fixed- 
interval smoothing distribution estimator. For that purpose, it is required to 
impose a type of mixing condition on the Markov transition kernel; see [5] 
and the references therein. For simplicity, we consider elementary but strong 
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conditions which are similar to the ones used in [9], Chapter 7.4, or [3], Chap- 
ter 4; these conditions, which points to applications where the state space X 
is compact, can be relaxed, but at the expense of many technical difficulties 
[4, 37, 38, 40]. 

Assumption 4. There exist two constants < <r_ < a + < oo, such that, 
for any (x, i')eXx X, 

(60) cr_ < m(x, x') < a + . 

In addition, there exists a constant c_ > such that, f x(dxo)go(xo) > c_ 
and for all t > 1, 

(61) inf / M(x,dx')g t (x') > c_ > 0. 

x£X J 

Assumption 4 implies that z'(X) < oo; in the sequel, we will consider with- 
out loss of generality that i^(X) = 1. Note also that, under Assumption 4, 
the average number of simulations required in the accept-reject mechanism 
per sample of the FFBSi algorithm is bounded by a + /o-. 

The goal of this section consists in establishing, under the assumptions 
mentioned above, that the FFBS approximation of the marginal fixed inter- 
val smoothing probability satisfies an exponential deviation inequality with 
constants that are uniform in time and, under the same assumptions, that 
the variance of the CLT is uniformly bounded in time. 

For obtaining these results, we will need upper-bounds on G^ T and Gt t T 
that are more precise than the ones stated in Lemmas 3 and 7. For any 
function h G ^(X) and s <T, define the extension IT^/i £ J r b(X r+1 ) of h 
to X T+1 by 

(62) n StT h(x 0:T ) = h(x s ), x 0:T eX T+1 . 

Lemma 10. Assume that Assumptions 1~4 hold with T = oo. Let s <T . 
Then, for all t,T, N > 1, and h E ^(X), 

(63) |Gj r T (-,n s>T /i)|oo<P |t " 8| osc(7i)|^T(-,l)|oo, 
where Ct,T is defined in (26) and 

(64) 10 = 1-—. 

Moreover, for all t, T > 1, and h € J-"b(X), 

(65) |G t ,T(-,n SiT / l )| 00 </" s losc(/ l )|£ i , T (-,l)|oo. 



(66) 
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Proof. Using (27) and (29), 

G? T (x,TL s , T h) C? T (x,U s , T h) T (;ILs,Th)] 



£t, T (x,l) £&(*,!) (-,!)] 



To prove (63), we will rewrite (66) and obtain an exponential bound by 
either using ergodicity properties of the "a posteriori" chain (when t < s), 
or by using ergodicity properties of the backward kernel (when t > s). 

Assume first that t < s. The quantity Lt t T(xo;t,H s ,Th) does not depend 
on xo:i-i so that by (24) and definition (20) of L^t, 

C^ T (x t ,U S)T h) = L t , T (xo:t,~R s ,Th) 

(67) =/•••/ ' M ( 

dx u )g u (x u ) h(x s ) 

J J \u=t+l / 

= Ct > T(x t ,Ii a)T h). 
Now, by construction, for any t < s, 

(68) C t -i,T(xt-i,^ s ,Th) = J M(x t . 1 ,dxt)gt(x t )C tiT (x t ,U S! Th). 
The relations (66), (67) and (68) imply that 

G%r(x, n s , T h) _ /4£ t , T (-, TLs iT h)] m'[A,t(-, n S)T /i)] 



(69) 



Ct, T (x,l) m[A,t(-,1)] /i'[£ t)T (-,l)] 



where // = f S x and // is the nonnegative finite measure defined by 

fi'(A) = J J ^(dxt-iWixt^dxMx^lAixt), A e £>(X). 

Now, for any finite measure fi on (X,£>(X)), the quantity 

/4£t,r("> n s,T/Q] 
m[A,t(-,i)] 

J ■■■J Kdx t )l\l =t+1 M(x u -i,dx u )g u (x u )h(x s ) 



/ • • • / M(da;t) ]l«=i+i M(x u -i,dx u )g u (x u )h(x s )C S)T (x s , 1) 
/ • ■ • / n(dx t ) ]lw=t+i M(x u -i,dx u )g u {x u )£, St T(x s , 1) 

may be seen as the expectation of h(X s ) conditionally on Y^-t, where Xt is 
distributed according to A i— >■ /i(^4) //i(X). Under the strong mixing condition 
(Assumption 4), it is shown in [12] (see also [9]) that, for any t < s <T, any 
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finite measure \i and fj,' on (X,B(X)), any function h £ ^(X), that 
/ • ■ •/ n(dx t ) n«=t+i M(x u - 1 ,dx u )g u (x u )h(x s )£ StT (x s , 1) 



/ • • • / ^(dxi) n«=t+i ^(sw-i 5 dx u )g u (x u )£ Si T(x s , 1) 
/ • ■ • / //(ctet) n«=t+i M(x u _i, dx u )g u {x u )h{x s )C s , T {x s , 1) 



/ • ■ • / //'(dxt) n«=t+i M(x u ^ 1 ,dx u )g u (x u )C StT (xs, 1) 
< p s ~ l osc(h), 

where p is defined in (64). This shows (63) when t is smaller than s. 
Consider now the case s <t <T. By definition, 

C^ T (x t ,U StT h) = ■■ L t>T {x .. u Tl S) Th) B 0f_ 1 ( x «> dxu-i) 

(70) 

= / ••• / £t,T(x t ,l) B ?i Jv_ i (x M ,dx u _i)/i(a; s ), 
^ u=s+l 

where the last expression is obtained from the following equality, valid for 
s < P. 

T 

dx u )g 

u \Xu ) 

•> J u=t+l 

= h(x a )Ct )T {xt,l). 
Moreover, combining (33) and (70), 

^[££ 1)T (-,n s , T /0] 

(j)^L l (du t ^ 1 )M(u t ^i,dx t )g t (x t )C^ T (x t ,U StT h) 

<l)f_ x (dut-\)M(ut-i,dxt)gt(xt)C,t,T{xu 1) 
t 

x B^jv_ i (x n ,(ix lt _i)/i(x s ). 

u=s+l 



By plugging this expression and (70) into (66), we obtain 
G^ T (x,U StT h) _ r fffjjdxt) fAdxt) \ 



CtT{X,±) 



f f\fJ.(dx t ) V(dXt)\ TT T3 / i 



i)h(x s ) 
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with (J,(dxt) = 5 x (dxt)£t,T(%t,l) and // being the nonnegative measure de- 
fined by 

(j!{A)= / ^ 1 [m(-,x t )]5t i (x i )£ i>T (x t ,l)l J 4(x t )dxt. 



Under the uniform ergodicity condition (Assumption 4) it holds, for any 
probability measure 77 on (X, 23 (X)), and any A G B (X), 

J r]{dx')m(x,x) a + 

thus, the transition kernel B,j is uniformly Doeblin with minorizing constant 
a_/cr_)_ and the proof of (63) for s <t<T follows. The last statement of the 
Lemma follows from (63) and the almost-sure convergence 

lim G? T (x,h) = G tT {x,h), P-a.s., 
for all x G X, which was established in Lemma 7. □ 

4.1. A time uniform exponential deviation inequality. Under the strong 
mixing Assumption 4, a time uniform deviation inequality for the marginal 
smoothing approximation can be derived using the exponentially decreasing 
bound on the quantity G^ T obtained in Lemma 10. 

Theorem 11. Assume Assumptions 1-4 hold with T = oo. Then, there 
exist constants < B, C < oo such that for all integers N , s, T, with s <T, 
and for all £ > 0, 

(71) P[|< T (fc) - <Ps\ T (h)\ >e}< Be - CN ^^ h \ 



(72) n\$s\ T (h) ~ <t>s\Ah)\ >e}< Be~ Cm / osc W 

i T (h) and 4>^ T ( 



where <f>^ T (h) and 4>f< T (h) are defined in (6) and (14)- 



Letting s = T in Theorem 11 provides, as a special case, the (already 
known) time uniform deviation inequality for the filter approximation; how- 
ever, the novelty of the bounds obtained here is that these confirm the sta- 
bility of the FFBSm and FFBSi marginal smoothing approximations also 
when s is fixed and T tends to infinity (see [9] for further discussion). 

Proof of Theorem 11. Combining (27) with the definition (20) and 
Assumption 4 yields, for all x G X, 

\ 0~- £>tT(x,l) 

(73) — < ' ; / < i. 

0+ \*~t,TV, l)|oo 
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Let h£j 7 h {X T+1 ) and assume without loss of generality that 4>o : t\t(^) = 0- 
Then, (21) implies that 0o[Xo,t("j h)] = and the first term of the decom- 
position (22) thus becomes 



(74) 



^[L , T (, 1)] N -i E f =Q |^) 5o (^o,t(^ 1) ' 



where (^o)iLi are i-i-d- random variables with distribution p$. Noting that 
Lo,t = A),T we obtain an exponential deviation inequality for (74) by ap- 
plying Lemma 4 with 



v 



V 



i=0 



dp 



b = x[go(-)C , T (;l)}/\C , T (;h)\ 00 , 

Here, condition (I) is trivially satisfied and conditions (II) and (III) follow 
from the Hoeffding inequality for i.i.d. variables. 

According to (22) and (28), it is now required, for any 1 < t < T, to derive 
an exponential inequality for 



Note first that, using (73), we have 

v+\ a^ 1 ZlMGM&n s , T h)/\c t , T (-, i) 



\ N i 

H.T\ 



(J- 



We use again Lemma 4 with 

N 

a N = TV™ 1 ^utGZriti, U s , T h)/\C t , T {; 1) 

£=1 
N 

b N = N- 1 Y J 4, 

l/3 = c_/|^| 00 . 



def 



Assumption 4 shows that b > /3 and Lemma 10 shows that |a7v/&Ar| < M 
p\t-s\ qsc(/i), where p is defined in (64). Therefore, condition (I) of Lemma 4 
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is satisfied and the Hoeffding inequality gives 

N 



F[\b N -b\ >s] <E 



=1 



>£ 



■f t-1 



< 2exp(-2A^e 2 /l^ 



2 ) 



establishing condition (II) in Lemma 4. Finally, Lemma 10 and the Hoeffding 
inequality imply that 



\a N \ >e] <E 



N 



< 2exp|^-2 

Lemma 4 therefore yields 

a N 



Ne 2 



\uJt\^oP 2 ^ S loSC 2 (/l) 



>e) <2exp 



2exp 



Ne 2 c 2 _ 



> e 
Ne 2 



-r-N 
J~ t-1 



2 M 2 



so that 



P(|Af T |>e)<2exp 



2osc 2 (/ l )p 2 l*- s l|^|2 |^|; 
iVeVg 2 



A time uniform exponential deviation inequality for X^*=i -^t,T then follows 
from Lemma 13 and the proof is complete. □ 

4.2. A time uniform bound on the variance of the marginal smoothing dis- 
tribution. Analogous to the result obtained in the previous section, a time 
uniform bound on the asymptotic variance in the CLT for the marginal 
smoothing approximations can, again under the strong mixing Assumption 4, 
be easily obtained from the exponentially decreasing bound on Gt,T stated 
and proved in Lemma 10 for the quantity. 

Theorem 12. Assume Assumptions 1-4 hold with T = oo. Then, for 
all s <T, 

(\ 2 1 2 
— (l VSUpj^loo ) SUp|w t |ooOSc(/l) J ^, 
<r- v t>i > t>o J l-fr 



where T 0:T i T is defined in (48). 



In accordance with the results of the previous section, letting s = T in 
the previous theorem provides a time uniform bound on the asymptotic 
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variance for the filter approximation; nevertheless, as mentioned previously, 
the situation of interest for us is when s is fixed and T goes to infinity. 

PROOF of Theorem 12. Combining (73) and (65) with po(wo) = 1 
yields 

— 2T~rx? — r~Tu — - — wo ooosc(hy 

PoN(-) £ o,t(-,1)] V ct - 
Moreover, by inserting, for any < t < T, the bound obtained in (65) into 
the expression (49) of vt,T we obtain 

^i(t)j,T(',n S[T li)))^_i(i)i) fa + It-.lV 
-o—r? r~^Yi — Fi|oo|wt| 00 osc(/i)p l 1 . 

Finally, plugging the two bounds above into (48) gives 

r 0:T | T [n SiT /i] < (—(l Vsup|^| 00 )sup|6j t | 00 osc(/i)) (Vp 2|1 ~ s| ), 

which completes the proof. □ 

5. Proofs of Propositions 1 and 2. Having at hand the theory estab- 
lished in the previous sections, we are now ready to present the proofs of 
Propositions 1 and 2. 

Proof of Proposition 1. The average number of simulations required 
to sample conditionally on Q^ + i is cr + Q, s / Yl u =l w " m (£" > ^s+V)- Hence, 
the number of simulations required to sample {Jg}gLi has conditional 
expectation 

N n 

We denote u\ T d = P[J S X = i|J^] and uf. s+1]T = f P[Ji = I, J] +l = and 
write 

N 



i=i Ef=iw|m(^,C + i) 

c+fi* 2^ z_> — ^ — : — — — 



5s+l J 
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For the bootstrap particle filter, wf = 5s(£s)j Theorem 5 then implies that 
Q s /N — >N^oo <P s \s-i(9s) and 
N N 



Besides, 



Ajv^oo 1 1 <t>s:s+l\T(dx 3:s+1 ) 1 . 

j j ysy^sj lll '\- i 'si -^s+ij 

^s:s+l|T(^s:s+l) 7 s 7 T 

, /j flf s (x s )M(x s ,dx s+ i) 

s i s _i(dx s ) — - — r — r g s+1 (x s+ i) 

g s (x s )m(x s ,x s+ i) 



T 

x 

u=s+2 



M{x u -i,dx u )g u {i 



-i(dx s )g 8 (x 8 ) M( dx u )g u (x 

a I 



u=s+l 

= J dxs+i Ul= s +2 1 M (x u -i,dx u )g u (x u ) 
/ • • • / </>s\s-i(dx s )g s (xs) I\l= s +i M(x u -i,dx u )g u (x u ) 

Similarly, in the fully adapted case we have u) l s = 1 for all i G {1, . . . , N}; 
thus, £l s = N and 

N N 

EE^ + nT^ m( ^ u:+i) 

-^AT^oo f f 4>s:s+l\T{dXs:s+l) 7 T 

J J m(x s ,x s+ i) 
/ • • • / 0s+i(z s+ i) cfa s +i nI=s+2 / M(a; u _i,da; u )g w (a; u ) 
/ • ■ •/ 4> s (dx s ) nr= s +i M(x u _i, dx u )g u (x u ) 

In both cases, the numerator can be bounded from above by 

/ • • • / <l>s\s-x(dx s )g s (x s ) nLs+i M(a; u _i, £fec„)</ u (a; w ) 
if J g u (x u ) dx u < oo for all u > 0. □ 
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Proof of Proposition 2. Fix a time step s of the algorithm and 
denote by C s the number of elementary operations required for this step. 
For k £ {1, . . . , n}, let be the number of times that k appears in list L at 

time s in the 'while' loop. Let also iV" = f X^fcLi ^{T fc >«} be the size of L (i.e., 

the value of n at line 6) after u iterations of the 'while' loop, with N® = N. 
Then, using Proposition 14 there exists a constant C such that 

As n — > n(l + log(l + N/n)) is a concave, increasing function, it holds by 
Jensen's inequality that 

nc s ]< ^|>ra (i + lQ g (i + i^j) ) • 

Besides, 



E[i\?]=£P(T*>u)<2\^l-^J 



as (j-/a+ is a lower bound on the acceptation probability. Thus, 



OO / 

E[C S ]<CNJ2[1 



a -\ u ( -> ,!__/'•, , 1 N \^ A ' Ar °"+ 



l + log 1 + - _ r- - < 



a + j v v 0--<r-/<r+)"JJ ~ o-- '□ 



APPENDIX A: PROOF OF LEMMA 4 



Write 



a N 



b N 
Thus, 



<6- 1 



?>A 



|6-6jv| + & _1 M <0 -1 M|&-&jv| + /r 1 |a 7V | 



6jv 



>4£<l»-M>|^} 



|ajv| > y 



-a.s. 



from which the proof follows. 

APPENDIX B: TECHNICAL RESULTS 

Lemma 13. Let {Y n ^}f =1 be a triangular array of random variables such 
that there exist constants B > 0, C > 0, and p with < p < 1 satisfying, for 
all n, i S {1, . . . ,n}, and e > 0, 

IP(|^|>e)<£exp(-CeV 2i )- 
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Then, there exist constants B > and C > such that, for any n and e > 0, 



i=l 



> e < Be - 



Proof. Set 5 = y/tpl; one easily concludes that 



i=i 



> e < ^P(|y„,i| > eS^vV) < J B^exp(-C5- 1 e 2 i). 

/ i=l i=l 

Set eo > 0. The proof follows by noting that, for any e > eq, 

n 

exp(-C5- 1 e 2 i) < exp^S^e 2 ,) exp(-C\S~V)/(l - exp^S^e 2 ,)). 



i=l 



□ 



B.l. Description of the sampling procedure. In this section, we describe 
and analyze an efficient multinomial sampling procedure, detailed in Algo- 
rithm 2. Given a probability distribution (pi, . . . ,pj\r) on the set {1, . . . , N}, 
it returns a sample of size n of that distribution. Compared to the procedure 
described in Section 7.4.1 in [3], its main virtue is to be efficient for both 
large and small samples sizes: if n = 1, the complexity is 0(log(iV)), while 
if n = N, the complexity is 0(N). 

Proposition 14. The number of elementary operations required by Al- 
gorithm 2 is 0(n + nlog(l + N/n)). 

Proof. The order statistics at line 5 and the permutation at line 6 can 
be sampled using 0(n) operations; see [13], Chapter V and XIII. For each 
value of k between 1 and n, denote by Gk the number of times lines 11-13 
are executed. Observe that line 18 is executed the same number of times, and 
thus the number of elementary operations required by call to Algorithm 2 
is 0(n + Ylk=i @k)- But the value of I is increased during iteration k by at 
least 2 Gk — 1, and as the final value of I is at most equal to N, it holds that 



J^2 Gfc < N + n. 



k=l 



By convexity, 

expj 

which implies that 



log(2) 



J2 G * <-E 2 + - 



n , 

^G fc <nlogf 1 + 
k=i ^ 



N 



n 



/log(2). 



□ 
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Algorithm 2 Multinomial sampling 



for k from 1 to N do 

q k i- qk-l+Pk 
end for 

sample an order statistics Un\, . . . , Ur n ) of an i.i.d. uniform distribution 
uniformly sample a permutation a on {1, . . . , n} 
l<-0,r<-l 
for A; from 1 to n do 
d<- 1 

while U(k) — qr do 
/ «— r 

r «— min(r + 2 d ,N) 
d^d + l 
end while 
while r — I > 1 do 
[{l + r)/2\ 
if ?7(fe) > <?m then 

I ^— m 
else 

r <— m 
end if 
end while 



end for 
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